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Abstract. In previous work, the author has extended the concept of regular 
and irregular primes to the setting of arbitrary totally real number fields fco, 
using the values of the zeta function at negative integers as our "higher 
Bernoulli numbers". In the case where kg is a real quadratic field, Siegel 
presented two formulas for calculating these zeta-values: one using entirely el- 
ementary methods and one which is derived from the theory of modular forms. 
(The author would like to thank Henri Cohen for suggesting an analysis of the 
second formula.) We briefly discuss several algorithms based on these formulas 
and compare the running time involved in using them to determine the index 
of feo-irregularity (more generally, "quadratic irregularity") of a prime number. 



1. Definitions 

Let fco be a totally real number field, and let p be an odd prime. Let k\ — fco(Cp)> 
where C, p n will denote a primitive p n -th root of unity. Let A = Gal(fci/fc ), and let 
S = |A|. Let p e be the largest power of p such that Cp c € ^o(Cp)- 

Definition 1. Let (k De the zeta function for k$. We say that p is ko-regular if p 
is relatively prime to Cfc (l — 2m) for all integers m such that 2 < 2m < 5 — 2 and 
also p is relatively prime to p e (fc (l — S). The number of such zeta-values that are 
divisible by p will be the index of fco -irregularity of p. 

According to a well-known theorem of Kummer, p divides the order of the class 
group of Q(Cp) if and only if p divides the numerator of a Bernoulli number B m for 
some even m such that 2 < m < p — 3. Such primes are called irregular; the others 
are called regular. In the setting we have described above, the author proved in 
his thesis ([7], see also [8]), building on work of Greenberg and Kudo, that under a 
certain technical condition Kummer's criterion can be extended to give information 
about whether p divides the class group of fco(Cp)- To be exact, let fc^ denote the 
maximal real subfield of k\, which is equal to fco(Cp + Cp -1 )- L e t, h(kx) denote the 
class number of fci and h + (k\) denote the class number of . It is known that 
h + (ki) | h(ki); we let the relative class number h~(k±) be the quotient. 

Theorem 1 (Greenberg, Holden). Assume that no prime of the field lying over 
p splits in k\. Then p divides h~(ki) if and only if p is not kg-regular. 

As an application, we note that one common way of constructing public-key 
cryptographic systems is to utilize the problem of finding a discrete logarithm in 
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some abelian group. In order to make sure that the discrete logarithm problem 
is computationally hard, one needs to know something about the structure of the 
group involved, e.g. that it is divisible by a large prime. Theorem 1 shows that 
if p is a large fco- irregular prime and the conditions of the theorem are met, then 
the class group of &o(Cp) ma y be especially suitable for cryptography. (One should 
see [3] for more on the use of class groups in cryptography.) 

For the case we consider, ko will be a real quadratic field Q(y/D), with D a 
positive fundamental discriminant. For such a fco, we will say that primes are ir- 
regular or have given index of D-irregularity, and we will let the zeta function £fc 
be also denoted by (d ■ (More generally, we may refer to the concept as "quadratic 
irregularity".) In this case 5 will be equal to p — 1 unless D — p, in which case 
5 = (p — l)/2. Also, e is always equal to 1 when p does not divide the order of fc 
over Q, which is true in this case since p is odd. For the condition in Theorem 1 
that no prime of the field k^ lying over p splits in k\ to be satisfied it is sufficient 
that p should not divide D, and we should also note that since p does not divide 
the degree of fco = Q(VD) over Q, a theorem of Leopoldt shows that p divides 
h(ki) if and only if p divides h~(ki). 

In general, we will consider three cost models for the time of multiplication: 
first using naive multiplication (0(tt')), second using Schonhage-Strassen fast mul- 
tiplication or a similar method (0(t\g(t')°^), and third using a model where 
multiplication (or addition) takes constant time regardless of the size of the factors 
(O(l)). We do not expect constant time multiplication to occur asymptotically in 
the real world, but it can provide useful estimates in situations where the size of 
the numbers involved is small compared to the word size of the actual computer 
in question. (In these running time bounds, t is the number of bits in the larger 
multiplicand and t' the number of bits in the smaller.) 



2. First formula 

Siegel's first formula to compute Cd(1 — 2m) for to > 1 an integer is analogous 
to the formula ((1 — 2m) = — i?2 m /(2m). Using elementary methods, Siegel showed 
that similarly 

(1) W 2m) = ^D 2m -if2x(j)B2m(j/D). 

Here x(j) = (j)' tnc Kroncckcr symbol, and Bi m {jjD) indicates the 2m-th 
Bernoulli polynomial evaluated at the fraction j/D. The Bernoulli polynomial 
B r (x) can be computed from the Bernoulli numbers as 




It is not difficult to estimate the sizes of the numbers involved. We will assume 
throughout that B m , 1 < to < M, are precomputed over the common denominator 
of the final result, and stored in this fashion each has size 0(m(\gm + IgD)) bits 
for a total table size of 0(M 2 (lgM + IgD)) bits. (The precomputation does not 
in fact add to the asymptotic running time.) The rational numbers B<i m {jjD) can 
also be stored in 0(m(lgm + lg-D)) bits, as can the total. See [6] for more details. 
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A first attempt at an algorithm based on (1) might compute B (a), . . . , Bm{<x) 
naively from the formula. The time taken for this would be dominated by the 
powerings. For a = a/b some rational number, the total time with naive multi- 
plication would be 0(M 4 (lgM + lga + lg6)°( 1 ^). Using fast multiplication instead 
of naive multiplication would improve this to (9(M 3 (lgM + Iga + lgb)°^), while 
with constant time multiplication we need only time 0(M 2 lg M) regardless of a 
and b. 

However we can do better than this, using a cross between Horner's method 
of evaluating polynomials and an algorithm used by Brent to calculate Bernoulli 
numbers, as was previously discussed by the author in [6] . This method gives a total 
time of 0(M 3 (\gM + lga + lgo) ' 1 -*) using either constant or fast multiplication, 
and time 0{M 2 ) using constant time multiplication. 

Using either of these algorithms to compute Cr>(l — 2m), 2 < 2m < M, is then 
relatively straightforward. Note that the Kroncckcr symbol x(j) can be computed 
in time 0(lg 2 D). The slower version of the algorithm has time 0(M 4 D(lgM + 
l g D)°W) with naive multiplication, 0(M 3 D(lgM + lgL>) 0(1) ) with fast multipli- 
cation, and 0{M 2 Di\g M +lg D)°^) with constant time multiplication. The faster 
version runs in time 0(M 3 D(lgD + lgM) ^ 1 )) with either naive or fast multipli- 
cation (the 0(1) factor is different, of course) and again in time 0(M 2 D(\gM + 
\gD)°^) with constant time multiplication. 

3. Second formula 

Siegel's second formula is, as I said, derived from the theory of modular forms. 
In general, for fc a totally real number field as above, it says that 

r 

C feo (l - 2m) = -2" C ^^T C2 „ m ^(2m), 
1=1 

where n = [ko : Q], c 2m „ = C2 mn ,o an d C2 mn ,i are rational integers depending only 
on 2mn and I (given by explicit formulas which we will discuss), 




Lmn/6J if 2mn = 2 modulo 12 

[mn/6j + 1 otherwise, 



and Si° is a sum over norms of ideals in the ring of integers of k , namely 

sf°(2m)= Yl ff2m-i(M9), 

^e(o)- 1 , i/»o, tr(i/)=; 

where 

a 2m _ 1 (a) = ^iV(58) 2ro - 1 

<8|2l 

is a generalization of the usual sum of powers function and d is the different of /co- 
in the quadratic case this all becomes much easier: 

r 

(2) CD(l-2m) = -4c^5^C4m,!*?(2m), r = [m/3\ + 1, 

i=i 

s? (2m) = Yl <r2m-i(("VD)), 
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and sf(2m) can also be expressed in terms of a purely arithmetic function e 2m -i(n), 
as follows: 



S f°(2m) = y2x D (j)j 2m - 1 e 2m -i((l/j) 2 D) 



and 



where 



i(»)= E 



e-im-\\n) = a ^ 

~n (mod 4) 
\x\<s/n 



d\n 

is the usual sum-of-powers function. (See [10], [11], [4] and [5] for more detailed 
descriptions of these formulas, and for their derivations.) The coefficients C4 m ,i 
are most easily expressed as the coefficients of a certain power series, and can be 
computed as needed without adding to the asymptotic running time. We will give 
explicit formulas for the power series and discuss its computation in Section 4. It 
is not hard to prove that in this form C4 mj ; is of size 0(m). The running time for 
calculating the function e2 m -i(n) is complicated by the need for factoring; we use 
here an estimate based on the elliptic curve factoring method (we would expect 
something very similar with any of the other standard subexponential methods) to 
get an expected running time involving the function L(x) = e^ [ ° sxl ° s[ogx . Given 
this, we get an expected running time to compute e 2m -i(n) of 

0(V^L(n) 1+o(1) + (2m - l) 2 V^lg 2 n) 

using naive multiplication. If we now applied (2) as written to compute all Cn(l — 2m), 
2 < 2m < M, we would get a running time of 

0(M 3 Vl)L(M) 0(1) L(D)° (1) lgM + M 5 VZ> lgM(lgM + lgD)° (1) ), 

again using naive multiplication. 

However, it is more efficient to rearrange the terms of the formula as follows: 

r 

Cz?(l-2m) = -4c4„ 1 l ^c 4m ,;sf (2m) 

l=i 

= -4c 4 ™ E C W E Xn(j)j 2m - 1 e 2m - 1 ((l/j) 2 D) 
*=i j\i 

r (Vr/k\ \ 

(3) = -4c 4 -^ XdU).] 2 " 1 - 1 ^^ e 2ro _ 1 (fc 2 £>) 

fe=i \ i=i / 

This rearrangement of the formula requires fewer calls to compute e2 m -i by a 
factor of lgm. Using this version of the formula, the time necessary to compute all 
Cd(1 — 2m), 2 < 2m < M, using naive multiplication is 

0(M 3 VI)L(M) o W L(L>)°W + M 5 y/D(lgM + lg£>)° (1) ). 

This is much worse than the best algorithm based on (1) in terms of M, but it is 
better in terms of D. Also, except for one final division by c^ m , all of the arithmetic 
in this formula deals only with rational integers; unlike the previous formulas. Note 
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that the first term comes from the factoring process, while the second term comes 
from multiplications. 

It should be noted that the asymptotic running time of this algorithm is greatly 
improved by using Schonhage-Strassen fast multiplication or constant time multi- 
plication, in which cases the second term becomes smaller than the first and the 
running time becomes 

0(M 3 y/DUM)°^ L(L>)°W) 

This is still worse than using (1) in terms of M, but only by a subexponcntial 
factor. 

It should also be noted that (2) and (3) also present opportunities for time 
savings when computing zeta- values for multiple D in the same range of M, at a 
sacrifice of memory space. The controlling factor in the speed of the algorithm is 
the number of times that <72 TO -i(n) must be calculated. Note that in computing all 
Q(l — 2m), 5 < d < D, there can only be 0{m 2 D) different values of n. However, 
following the algorithm strictly, we would ordinarily make 0(m 2 D 3 / 2 ) calls to the 
subroutine that calculates this function. 

Thus if we compute all Q(l — 2m), 5 < d < D, storing values of CT2 m -i{n) 
as we compute them, and then repeat this process for each m in the range 2 < 
2m < M, the running time should be 0(DL(D) ^) in terms of D, rather than 
O(Z) 3 / 2 L(D)°^) as one would obtain following the algorithm strictly. This com- 
pares very favorably with the time of 0(D 2 ) in terms of D which holds for algo- 
rithms using (1). 

Since the exponent 2m — 1 used in the (Ji m -\(n) function changes as m does, 
we can dispose of the table when we change m. The table that we need to keep 
requires at most 0(M 3 D(lgM + lgD)) bits of storage, which could be a significant 
barrier. More efficient storage of the important information may be valuable here; 
we will discuss this somewhat more in Section 5. 

4. Computing the numbers C4 m ,; 
The integers c\ m j, are defined as follows. Let 

2k °° 

G fe = 1 - — Vcr fc _i(n)q" 

be the (normalized) Eisenstein series of order k for k = 6, 10, and 14. (For the 
general C2 mn ,i one also needs k — 0,4, and 8.) Let 

oo 

71=1 

(OO 
^(-l)"(2n+l) g "( n+1 )/ 2 
n=0 

be the discriminant series. Let r = L m /3J + 1 as before, and let 

oo 

Ti m = Gi2r-4m+2A r = ^ C im ^ n q n . 

n=—r 

Then C4m — Cim,Q, and the other Ci m ,i for 1 < I < r can also be read off as 
coefficients of T4 m . Luckily, the expression 12r — 4m + 2 only takes on the values 6, 
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10, and 14. (In the more general case we can define T 2m „ similarly; the expression 
12r — 2mn + 2 can take on the values 0, 4, and 8 in addition to those above.) 

The best algorithm known to the author for calculating these coefficients goes 
roughly as follows. At the start of the computations for Cr>(l — 2m), 2 < 2m < 
M, calculate G@, Gio, G14, and A -1 with the maximum number of coefficients 
necessary (about M/12). Instead of trying to compute all of the needed series A~ r 
at once, we calculate it as a running product which only needs to be updated when 
r changes. Then, whenever m changes, we multiply truncated versions Gi2 r -4m+2 
and A~ r (with about m/6 coefficients each) to find the required coefficients of T 4m . 

The series A -1 can also be expressed as 



where p(n) takes on integer values and is well-known as the partition function from 
additive number theory. Hardy and Ramanujan proved an asymptotic expression 
for p(n) which shows that \gp(n) is of order y/n. (See, for example, Chapter 14 
of [1].) From this it is easy to show that the coefficients of A _r take at most 
0(M 1 - 5 ) bits of storage each, as do the coefficients of G\2r-4m+2 and T^ m . Thus 
the storage required for all of the computation of &± m j necessary for a fixed m is 
0(M 2 - 5 ). The resulting table, which is of course independent of d, can be used to 
compute Cd(l ~ 2m) for any range of d and then disposed of when m is changed. 

As far as the time for this algorithm is concerned, computing the tables necessary 
for all Cd(l - 2m), 2 < 2m < M, 5 < d < D, takes time 0(M 5 ) with naive multi- 
plication of integers and also of polynomials. By using FFT methods to multiply 
polynomials the time can be reduced to 0{M 3 - 5 lgMlglgM) with fast multiplica- 
tion of integers and 0(M 2 lgM) with constant time multiplication of integers. (The 
use of FFT methods here was suggested by A.O.L. Atkin and Will Galway.) This is 
within our previously established time bounds in the naive and constant time cases; 
however in the fast multiplication case it could add to the total asymptotic time 
in terms of M, which for the previous parts of the algorithm was established as 
0(M 3 L(M)° (1) ) in terms of M. On the other hand it should be noted that these 
calculations only need to be done once per value of m no matter how many values 
of d one is examining. Also the constant involved in the 0(M 3 5 lg M lg lg M) seems 
to be quite good in practice compared to that in the 0(M 3 L(M)°^). 



Table 1 presents the various algorithms, for comparison. We present the asymp- 
totic order of the running time to compute £d(1 — 2m), 2 < 2m < M, for a given 
D, using the three methods of multiplication discussed earlier. 

The factor of lg M in the times for algorithms based on (2) has been included to 
emphasize that these algorithms are slower than those based on (3), even though 
the factor of lgM could be absorbed into that of L(M)° (1) . 
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Table 1. Comparison of algorithms for Cd(1 — 2m), 2 < 2m < M 



Equation 
used 



Multiplication Time order 



M i D{\gM + lgL>)°W 
M 3 D(\gM + lgD)° (1) 
M 2 L>(lgM + lgD) 0(1) 
M 3 D(\g D + lg M)° (1) 
M 3 L>(lg D + lg M)° (1) 
M 2 £)(lgM + rgD)° (1) 
M 3 V^L(M) ( 1 )Lp) « 1 

+M 5 VZ>(lgM 
M 3 VDL(M) ^L(D)°^>1 
M 3 VDL(M) WL(D) W 1, 

m 3 Vdl(m)°^ 

+M 5 

M 3 VflL(M)°( 1 )L(D)°( 1 ) 
M 3 ^DL(M)°( 1 )Lp)°( 1 ) 



(1) 
(1) 
(1) 

(1) from [6] 
(1) from [6] 
(1) from [6] 

(2) 

(2) 
(2) 
(3) 

(3) 
(3) 



naive 
fast 

constant 

naive 

fast 

constant 
naive 

fast 

constant 
naive 

fast 

constant 



Lp)°« 
D{\gM 



gM 

IgD)O(i) 
gM 
gM 

IgD)O(i) 



Table 2. Calculating ( D (l - 2m), 2 < 2m < M 



D 


M 


time (1) from [6] 


(stacksizc) 


time (3) 


(stacksize) 


5 


100 


3.155 


(10M) 


.838 


(10M) 


101 


100 


55.561 


(10M) 


3.758 


(10M) 


501 


100 


4:50.282 


(10M) 


12.480 


(10M) 


1001 


100 


10:52.411 


(10M) 


20.670 


(10M) 


5001 


100 


48:27.107 


(12M) 


1:43.548 


(10M) 


5 


500 


2:05.670 


(4M) 


8:11.603 


(4M) 


101 


500 


42:38.612 


(4M) 


1:18:26.615 


(4M) 


501 


500 


3:48:18.615 


(8M) 


4:45:20.438 


(4M) 


1001 


500 


7:49:56.048 


(12M) 


7:49:00.783 


(4M) 


5 


1000 


10:05.903 


(4M) 


3:55:46.908 


(4M) 


5 


2000 


1:14:01.992 


(16M) 


118:17:46.020 


(4M) 



We also provide, in Tables 2 and 3, tables of actual timings for some of the 
algorithms, using naive multiplication. The times were measured on a Sun SPARC 
Ultra- 1 computer using the GP-Pari interpreted language. (See [2].) The data 
computed by these programs will be analyzed in a future paper. 

Table 2 measures the time to compute Cd(1 — 2m), 2 < 2m < M, for a given 
D, in hours, minutes, and seconds. The number in parentheses indicates the size of 
the stack used, in bytes. These numbers are only a very rough guide to the actual 
amount of memory used. 

Table 3 measures the time to compute £d(l — 2m), 2 < 2m < M, 5 < d < D. We 
use the algorithm based on (3), both with and without keeping a table of <72m-i{ n ) 
as described earlier. The units and stack size numbers should be interpreted as in 
Table 2. 
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Table 3. Calculating Q(l - 2m), 2<2m<M,5<d<D 



D 


M 


time (3) 


(stacksize) 


time (3) with table 


(stacksize) 


100 


100 


1:19.638 


(4M) 


1:27.983 


(4M) 


500 


100 


18:34.377 


(4M) 


13:57.166 


(8M) 


1000 


100 


1:03:50.464 


(4M) 


37:52.552 


(12M) 


5000 


100 


26:39:46.109 


(4M) 


7:10:44.870 


(64M) 



As one can see, memory usage for algorithms based on (3) with a table of values 
C2m-i(") goes up quite quickly, and even so a lot of redundant work is being done. 
For one thing, the same numbers n will have to be factored repeatedly for different 
values of 2m — 1, but they will lead to different values of <72 m _i(n) so the actual 
factorization would need to be stored and not just a function value. This would 
be even more memory intensive. Carl Pomerance has suggested some approaches 
to this, including using a cache rather than a complete table, and storing only 
the largest prime factor rather than a complete factorization. The question of 
a cache leads naturally to the question of which numbers will appear as integer 
values of (k D — x 2 )/A as k and D vary, and how often. Henri Cohen, in [5], gives 
some variations on Sicgcl's formula which could cut down on the number of times 
&2m-i(n) needs to be computed. Algorithms based on these might provide a linear 
speedup over the algorithms presented here, but the asymptotic behavior would 
probably be the same. 

The anonymous reviewer has suggested that some of the arithmetic could pos- 
sibly be speeded up by the use of modular techniques and the Chinese remainder 
theorem. This certainly deserves more consideration. 

Another prospect for future work is the analysis of "first-hit" versions of these 
algorithms, namely determining how long we should expect to search before finding, 
say, the first D-irregular prime larger than a certain bound for D in a given range. 
This might be particularly useful for cryptographic applications, in which we would 
be explicitly looking for a class group (or a small number of them) with a hard 
discrete logarithm problem. This will be explained in more detail in [9]. 
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